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Abstract 

This paper deals with parameter estimation when the data are randomly right censored. The 



?—( ■ maximum likelihood estimates from censored samples are obtained by using the expectation- 

maximization (EM) and Monte Carlo EM (MCEM) algorithms. We introduce the concept of 
the EM and MCEM algorithms and develop parameter estimation methods for a variety of 
distributions such as normal, Laplace and Rayleigh distributions. These proposed methods are 
illustrated with three examples. 

Keywords: EM algorithm, Maximum likelihood. Censored data, Missing data. 



1 Introduction 

The analysis of lifetime or failure time data has been of considerable interest in many branches of 
statistical applications such as electrical engineering, biological sciences, medicine, etc. In reliability 
analysis, censoring is very common because of time and cost considerations on experiments. The 
data are said to be censored when, for observations, only a lower or upper bound on lifetime is 
available. 

The problem of parameter estimation from censored samples has been treated by several au- 
thors. Gupta [5 has studied the maximum likelihood estimate (MLE) and the best linear estimate 
for Type-I and Type-II censored samples from a normal distribution. To obtain the MLE, some 
numerical optimization techniques need to be employed. Unless otherwise specified, "MLE" refers 
to the estimate obtained by direct maximization of the likelihood function. Maximizing the likeli- 
hood function directly using gradient methods such as the Newton- Raphson method is very fast and 
efficient. However, these methods are very sensitive to starting values. Also, the larger the number 
of parameters, the higher the chance that the method will be unable to find the maximizer. In 
addition, if the likelihood function is very flat near its maximum, then the method will stop before 
finding the maximizer. Sultan |10) has given an approximation of the MLE for a Type-II censored 
sample from a normal distribution. Govindarajulu 4 has derived the best linear unbiased estimate 
(BLUE) for a symmetrically Type-II censored sample from a Laplace distribution for N up to 20. 
Balakrishnan [1] has given the BLUE for a Type-II censored sample from a Laplace distribution. 
The BLUEs need the coefficients a.i and bi , which were tabulated in [T] , but the table is provided only 
for sample size A^ = 20. The approximate MLE and the BLUE do not converge to the MLE. The 
methods above are also restricted only to Type-I or Type-II censored samples. These deficiencies 
can be overcome by the proposed methods based on the EM and MCEM algorithms. 

Until the advent of powerful and accessible computing methods, the experimenter was often 
confronted with a difficult choice: Either describe an accurate model of a phenomenon, which would 
usually preclude the computation of explicit answers, or choose a feasible approximate model which 
would avoid computational difficulties, but may not be close to an exact model. In an era of 
powerful computers, simulation-based estimation such as the EM algorithm promises to be one of 
the mainstays of applied parametric modeling and data analysis in the years ahead. 

We develop parameter estimation methods via the EM and MCEM algorithms when the data 
are randomly right censored. This random censoring is a generalization of Type-I censoring. These 
MLEs are obtained using the EM and MCEM algorithms. Since the calculation of the MLEs in 
Type-I and Type-II censoring are nearly identical, the proposed methods can be used for Type-II 



censoring also. We present the estimation methods when the data come from the normal, Laplace 
and Rayleigh distributions 

In section [2l we introduce the likelihood construction for censored data. Section |3] introduces 
the concept of the EM and MCEM algorithms. Section 0] provides estimation procedure. Section [S] 
illustrates examples. 

2 Likelihood construction for censored data 

Suppose that we observe x = (a;i, . . . ,Xn) which are independent and identically distributed (iid) 
and have a continuous distribution with the probability density function (pdf) f{x) and the cumu- 
lative distribution function (cdf) F{x). Data from experiments involving random censoring can be 
conveniently represented by pairs {wi,di) with Wi — min(a;j,_Rj): 

{0 Xi> Ri 
for z = 1, . . . ,n, 
1 x, < R, 

where 5i is a censoring indicator variable and Ri is a censoring time of test unit i. Denote the vector 
of unknown parameters hy = {Oi, . . . , Op). Then ignoring an normalizing constant, we have the 
complete-data likelihood 

n 
i=l 

Denote the observed (uncensored) part oi xi, . . . ,Xn by y = (j/i, . . . , y,„) and the missing (censored) 
part by z = (zm+i, . . . , z„) with Zj > Ri. Integrating L'^(6\x) with respect to z, we obtain the 
observed-data likelihood 



L{e\y) = J L%e\y,z)dz 



=n/(y») n / /(^^)^^^ 

rti n 

I— 1 J— rn+l 

Using the [wi^Si) notation, we have 

n 

i(0|w, 8) = n[/(^*)]'' [1 - F{w,)Y-'^ , (1) 

i=l 

where w = (wi, . . . , Wn) and 6 — (Ji, . . . , (5„). 



For Type-II censoring, the data consist of the rth smaUest hfetimes X(i) < X(2) < • • • < a;(r) 
out of a sample of size n. Assuming that we observe x = (xi, . . . ,Xn) which are iid and have a 
continuous distribution, it follows that the joint pdf of xii\, . . . , xi^) (see [2]) is 



n 



(n — r)\ 



\[f{m) n [i-^(^m)]. 



Ignoring an normalizing constant, we can rewrite the above equation in the form of ([T]) by setting 
Ri — Xf^^y Hence we can use ([T]) for the calculation of the MLE in both Type-I (including random 
censoring) and Type-II censoring. 

3 The EM and MCEM algorithms 

The EM algorithm is a general technique for finding maximum likelihood estimates for parametric 
models when the data are not fully observed. The EM algorithm was originally introduced by 
Dempster et al. 3 to overcome the difficulties in maximizing likelihoods. 

The key idea behind EM algorithm is to solve a difiicult incomplete-data problem by repeat- 
edly solving tractable complete-data problems. The E-step of each iteration only involves taking 
expectations over complete-data conditional distributions and the M-step of each iteration only re- 
quires complete-data maximum likelihood estimation, which is often in simple closed form. For 
incomplete-data problems, the most attractive features of the EM algorithm relative to other op- 
timization techniques are its simplicity and its stability. Rather than maximizing the potentially 
complicated likelihood function of the incomplete data directly, we repeatedly maximize the log- 
likelihood function of the complete data given the incomplete data, which is typically much easier 
and often equivalent to finding MLEs with complete data. Moreover, successive iterations of the 
EM algorithm are guaranteed never to decrease the likelihood function, which is not generally true 
of gradient methods like Newton-Raphson. Hence in the case of the unimodal and concave likeli- 
hood function, the EM algorithm converges to the global maximizer from any starting value. We 
can employ this methodology for parameter estimation from a censored sample since censored data 
models are special cases of missing data models. 

The EM algorithm consists of two distinct steps: 

• E-step: compute 

Qi9\9^''>) = J\ogL%e\y,z)p{z\y,e^'^)dz. 



• M-step: find the 0^'*+^) 

which iTLaximizes Q{6\6^^') in 9. 

A difScuhy with the implementation of the EM algorithm is that each E-Step requires the 
integration of the expected log-likelihood to obtain the Q{9\9'^'). Because of the integration, maxi- 
mizing (3(0|0 ) can be difficult even when maximizing L'^{9\y, z) is trivial. Wei and Tanner |12|ll3j 
propose using the MCEM to overcome this difficulty by simulating Zm+i, . . . , Zn from the conditional 
distribution p{z\y, 9^'^') and then maximizing the approximate expected log-likelihood 

K 

K 



1 ^ 



k=l 



where z^'^^ = (zm+i,fc, ■ • ■ , Zn,k)- 

The books by Little and Rubin 6 , Tanner [TT , and Schafer [5] provide good overview of the 
EM literature. 



4 Parameter estimation 

4.1 The normal distribution 

Let Yi, . . . ,Ym and Z„i+i, . . . , Zn be iid normal random variables with 9 = (yu,cr^). Then the 
complete-data log-likelihood is 



logL=(0|y,z) 

-. m m 

= c - 1 loga^ - — { Y,y1 - 2fiJ2y^ + "^^'} 

1=1 1=1 

Because of the iid structure, the predictive distribution of the missing data given 9 does not 
depend on the observed data. Thus the z^'s are observations from the truncated normal distribution 

p{z\y,9)^p{z\9) 



i—m-\-l i—m-\-l 



where (/){■) and $(■) are pdf and cdf of N{0, 1), respectively. Using the fohowing integral identities 

-4>{ )dz = M$( ) - (T(j){ ), 

CF (7 (7 (7 

cr (7 cr a 

we have the expected log-likelihood at the sth step in the EM sequence: 

= /"logi'=(6>|y,z)p(z|6>(^))dz 

= C-^loga2-— {^y2_2^^y^^^^2| 

i=l i=l 

^C~^\oga^- ^{T2 - 2/iTi + m^2| 
2 2a^ 



1 
2^ 



2{5r^-2^5r^ + (n-™)Mn, 



(s) (s) 

where Ti, T2, S'J , and ^2 are given by 



i=\ 
ni 

S't^^ E / ^^V^^^\Q^''^)dz^ 



z^m+1 



i-*(^^^^<r^) 



Differentiating the expected log-likelihood Q{0\6^''^') with respect to /x and c^ and solving for 
H and (T^, we obtain the EM sequences 

^i^■'+'^^-{T, + s['^}, (3) 

n 

a'^'+'^ = -{T2 + si^^}-^{T, + si''^Y. (4) 



If we instead use the MCEM algorithm by simulating Zm+i, ■ ■ ■ , Zn from the truncated normal 
distribution p(z|y, 6^'^') given by ^, then the Q{6\6^''') is replaced with the approximate expected 
log-likelihood 



K 



lX^logL=(0|y,z«) 
fe=i 

C - 5 log a^ - ^At, + l^i^) - 2m(Ti + 1t//^)) + Vl 



2^2 \-^ ' /f ^2 -PV-^ ' j^' 



where 



K 



and Zi^k is from 



n^^^ = E E -.^' 

/c— 1 2— m+1 
K n 

v^^^ = E E <- 

/e— 1 i— m+1 






for i = TO + 1, . . . , n. We then obtain the MCEM sequences by differentiating the Q{6\0^^') 

M(^+i) = i{ri + ly/^)}, (5) 

n K 

-'''^'' = -{T. + ^V^'']--AT. + ^vt'T. (6) 

71 K -^ n^ K 

This is merely an example of the MCEM algorithm since the ordinary EM algorithm applies. 



4.2 The Laplace distribution 

Let Fl, . . . , Ym and Zm+i, . . . , Zn he iid Laplace random variables with 6 — (/x, a), where the pdf is 

/(^|0) = ^exp(-^-i^), a>0. 

Then the complete-data log-likelihood is 

logL=(0|y,z) 

^ m ^ n 

^C -nloga y^\y^-^J■\ y^ \zt-lJ.\. 

a ^ — ' a ^ — ' 

i— 1 i— 7n+l 



Because of the iid structure, the predictive distribution of the missing data given does not depend 
on the observed data. Thus the z^'s are observations from the truncated Laplace distribution 



p(z|y,0)-Mz|0) 



{zi > Ri), 



where F{-) is the cdf of Laplace random variable. Then at the sth step in the EM sequence, we have 
the expected log-likelihood 

logi'=(6»|y,z)p(z|6>(''')dz 



^ m ^ n /'oo 

log cr y^\y^-^J■\ V / |Zj -^|p(zi|0(''))dZj 

'^ ±:1 <^ ,-±:1 1 Jr, 



C -n 

The computation of the above integration part is very complex. We can overcome this difficulty by 
using MCEM approach. The approximate expected log-likelihood is 



K 



-if^logL^(0|y,zW) 

^ ^ K m n 

C-nlogcr- -— ^|^|y, -^1+ Y^ \z,,k-fi\y 



k—l 2—1 i— m-t-1 

where z*-*^' = {zm+i,k, ■ ■ ■ , Zn.k) is fromp(z|0^*-^). Note that it is easy to simulate a truncated Laplace 
random variable by using the inverse transformation method; see Appendix. Using this, we obtain 
the MCEM sequences: 



M 



median(y ® iC, z'^^, . . . , z*-^') (7) 



i—l k—l i—m-\"l 

where y (E) K — {y, . . . ,y), that is, K replications of y. 

4.3 The Rayleigh distribution 

Let Yi, . . . , Ym and Zm+i, . . . , Zn be iid Rayleigh random variables with the pdf: 

2 

/(x|/3) = -^exp(-^), x>0, /3>0. 



Then the complete-data log-hkehhood is 

m 
' — 9,n.]np^ B -\- \ lopr?/.- — 



log L^(/3|y, z) = C - 2n log /3 + X] log y, - -i^ ^ y2 



2— TTi+l i— m+l 



2 



The predictive distribution of the missing data given f3 does not depend on the observed data. Thus 
the Zi^s are observations from the truncated Rayleigh distribution 

p{z\y,f3) 

Then at the 5th step in the EM sequence, we have the expected log-hkehhood 

logi^(/?|y,z)p(z|/3(«))dz 

m w m 

C -2nlog 13 + J2\ogy,-—J2y! 



E / {^og z,-—^zf)p{z,\l3(^^)dz. 



The calculation of the above integration part does not have a closed form. Using MCEM, we have 
the approximate expected log-likelihood 



= j^Y.^ogL^{P\y,z(''^) 

m _, m 

== C - 2n log /3 + ^ logy, --— ^ 



2/32 ^ y^ 

i=l '^ i=l 

_. K n ^ ^ K n 

+ ]fE E iog^^fc-^]fE E ^Ik' 

fc— 1 i—m-{-l /c— 1 i—m+1 

where z^''' — (^m+i.fe, • • • , -Zn.fc) is from p{z\f3^^'). Using the inverse transformation method, we can 
simulate a truncated Rayleigh random variable X 



X = v/i?2 - 2/32 log jj^ 
where C/ is a uniform (0, 1) random variable. We then obtain the MCEM sequences by differentiating 

QWV') 



where 



T2- 


>:yi 




v^'^ 


K n 


2 



k—1 z— rn+1 

5 Illustrative Examples 

This section provides three numerical examples of parameter estimation for the normal, Laplace and 
Rayleigh distributions using the EM algorithms. 

Example 1: censored normal sample 

Let us consider the data presented earlier by Gupta [S] in which, out of A^ = 10, the largest three 
have been censored. The Type-II right censored sample is as follows: 

L613, L644, 1.663, 1.732, 1.740, 1.763, 1.778. 

In this case, Gupta [S^ computed the estimates of the mean and the standard deviation by three 
different methods, viz. (i) best linear (/t = 1.746,(7 = 0.101), (ii) alternative linear (/t = 1.748, tj = 
0.094), and (iii) maximum likelihood [fi — 1.742.0' — 0.072). His calculation of the MLE seems to 
be incorrect. The new calculation of the MLE \s jl — 1.742 and a = 0.079. 

We use the EM sequences from Q and (|4|) . Table [1] presents the iteration sequence of the 
implementation of the EM algorithm for this problem. Starting values are chosen by (i) taking 
the sample mean and sample variance of the uncensored data (/i'"-* ~ 1.7, a = 0.004) and (ii) 
selecting arbitrary numbers (for example, /i^") = 0, ct^ — 1). We obtain the same result in both 
cases up to the third decimal point after about 10 iterations. 

Next, we use the MCEM sequences from ([5]) and ©. Table [5] presents the iteration sequence 
of the implementation of the MCEM algorithm. The algorithm was run with K — 50,000 for 15 
iterations with different starting values, yielding the same results as the MLE up to the third decimal 
place. 



Table [T] around here 



Table [2] around here 
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Example 2: censored Laplace sample 

Let us consider the data presented earlier by Balakrishnan et al. [1^ (simulated with /i = 50 and 
(7 = 5) in which, out of A^ = 20 observations, the largest two have been censored. The Type-II 
right-censored sample thus obtained is as follows: 

32.00692, 37.75687, 43.84736, 46.26761, 46.90651, 

47.26220, 47.28952, 47.59391, 48.06508, 49.25429, 

50.27790, 50.48675, 50.66167, 53.33585, 53.49258, 

53.56681, 53.98112, 54.94154. 

In this case, Balakrishnan et al. 1 computed the BLUEs of /i and cr as /i = 49.56095 and 
a = 4.81270. The MLE is /i = 49.76609 and a = 4.68761. 

We use the MCEM sequences from ([7]) and ([5]). Table [3] presents the iteration sequence of the 
implementation of the MCEM algorithm. The algorithm was run with K — 50, 000 for 5 iterations 
with the starting value (/i'-'^' = 0, cr'-'^' — 1), yielding the same results as the MLE up to third decimal 
place. When compared to the BLUE, our result is closer to the MLE. 



Table [3] around here 



Example 3: censored Rayleigh sample 

We simulated a data set with /3 = 5 in which, out of A^ = 20 observations, the largest five have been 
censored. The Type-II right censored sample thus obtained is as follows: 

1.950, 2.295, 4.282, 4.339, 4.411, 
4.460, 4.699, 5.319, 5.440, 5.777, 
7.485, 7.620, 8.181, 8.443, 10.627. 

We use the MCEM sequences from ^. Table H] presents the iteration sequence of the imple- 
mentation of the MCEM algorithm. Two different starting values (/3''°' = 1 and Z?'"-* = 100) are 
chosen to show that the MCEM is very insensitive to the choice of starting value. This iteration 
sequence shows that the MCEM converges vary fast. The algorithm was run with iiT = 50, 000 for 
10 iterations. We obtain (3 = 6.1324 and /3 — 6.1332 with different starting values. The MLE is 
/3 — 6.1341. The results are the same as the MLE up to the second decimal place. 



Table |4] around here 
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Appendix 

Simulation of truncated normal random variable 

Let U he a, uniform (0, 1) random variable. For any continuous cdf F{-) if we define the random 
variable X by X = F^^{U), then the random variable has distribution function F{-); see [S]. This 
method is called the inverse transformation method. 

Using this method, we have the following truncated normal random variable X: 

X = fi + cr$-i{((l - $(r))L/ + $(r)}, 
where r = {R — lJi)/(y and U ^ f7(0, 1). 

Simulation of truncated Laplace random variable 

Using the inverse transformation method, we have the following truncated Laplace random variable 
X: 

(i) R>fi 

X = R-a\ogU, 

(ii) R < fi 

j H + alog{2U+{l-U)exp{r)} ii U < H 

[ fi - a\og {2{1 - U) - {1 - U) exp(r)} ii U > H, 

where r={R- fi)/a and H = {1 - eJ)/{2 - eJ). 
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Tabic 1: Iterations of EM with normal sample. 



s 


m(^) 


ais) 


Ai(^) 


ais) 





1.7 


0.0041/2 





1 


1 


1.7358 


0.0702 


1.8467 


0.2968 


2 


1.7397 


0.0754 


1.8058 


0.1931 


3 


1.7411 


0.0775 


1.7761 


0.1370 


4 


1.7418 


0.0784 


1.7593 


0.1070 


5 


1.7420 


0.0788 


1.7504 


0.0919 


6 


1.7421 


0.0790 


1.7459 


0.0848 


7 


1.7422 


0.0791 


1.7439 


0.0816 


8 


1.7422 


0.0791 


1.7429 


0.0802 


9 


1.7422 


0.0791 


1.7425 


0.0796 


10 


1.7422 


0.0791 


1.7424 


0.0793 


11 


1.7422 


0.0791 


1.7423 


0.0792 


12 


1.7422 


0.0791 


1.7423 


0.0792 
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Table 2: Iterations of MCEM {K = 50, 000) with normal sample. 



s 


^i(') 


a(^) 


^i(-) 


a(^) 





1.7 


O.OO4I/2 





1 


1 


1.7363 


0.0708 


1.8472 


0.2976 


2 


1.7398 


0.0756 


1.8061 


0.1938 


3 


1.7412 


0.0777 


1.7763 


0.1375 


4 


1.7417 


0.0784 


1.7593 


0.1070 


5 


1.7420 


0.0788 


1.7503 


0.0918 


6 


1.7421 


0.0789 


1.7459 


0.0847 


7 


1.7422 


0.0791 


1.7439 


0.0816 


8 


1.7422 


0.0791 


1.7429 


0.0802 


9 


1.7423 


0.0792 


1.7426 


0.0796 


10 


1.7423 


0.0792 


1.7424 


0.0794 


11 


1.7423 


0.0792 


1.7423 


0.0793 


12 


1.7423 


0.0792 


1.7423 


0.0793 


13 


1.7422 


0.0792 


1.7422 


0.0792 


14 


1.7422 


0.0791 


1.7422 


0.0791 


15 


1.7423 


0.0792 


1.7423 


0.0792 
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Table 3: Iterations of MCEM {K = 50, 000) with Laplace sample. 



s 


Ai(^) 


„{s) 








1 


1 


49.7661 


4.3189 


2 


49.7661 


4.6493 


3 


49.7661 


4.6844 


4 


49.7661 


4.6884 


5 


49.7661 


4.6882 
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Table 4: Iterations of MCEM {K = 50, 000) with Rayleigh sample. 



I3is) p{s) 



1 100 

1 5.3358 50.2854 

2 5.9450 25.7063 

3 6.0900 13.9290 

4 6.1254 8.7677 

5 6.1321 6.8879 

6 6.1318 6.3287 

7 6.1329 6.1827 

8 6.1353 6.1478 

9 6.1326 6.1358 
10 6.1324 6.1332 
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